220 research outputs found

    Replicable Evaluation of Recommender Systems

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in RecSys '15 Proceedings of the 9th ACM Conference on Recommender Systems, http://dx.doi.org/10.1145/2792838.2792841.Recommender systems research is by and large based on comparisons of recommendation algorithms’ predictive accuracies: the better the evaluation metrics (higher accuracy scores or lower predictive errors), the better the recommendation algorithm. Comparing the evaluation results of two recommendation approaches is however a difficult process as there are very many factors to be considered in the implementation of an algorithm, its evaluation, and how datasets are processed and prepared. This tutorial shows how to present evaluation results in a clear and concise manner, while ensuring that the results are comparable, replicable and unbiased. These insights are not limited to recommender systems research alone, but are also valid for experiments with other types of personalized interactions and contextual information access.Supported in part by the Ministerio de Educación y Ciencia (TIN2013-47090-C3-2)

    Comparative recommender system evaluation: Benchmarking recommendation frameworks

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in RecSys '14 Proceedings of the 8th ACM Conference on Recommender systems, http://dx.doi.org/10.1145/2645710.2645746Recommender systems research is often based on comparisons of predictive accuracy: the better the evaluation scores, the better the recommender. However, it is difficult to compare results from different recommender systems due to the many options in design and implementation of an evaluation strategy. Additionally, algorithmic implementations can diverge from the standard formulation due to manual tuning and modifications that work better in some situations. In this work we compare common recommendation algorithms as implemented in three popular recommendation frameworks. To provide a fair comparison, we have complete control of the evaluation dimensions being benchmarked: dataset, data splitting, evaluation strategies, and metrics. We also include results using the internal evaluation mechanisms of these frameworks. Our analysis points to large differences in recommendation accuracy across frameworks and strategies, i.e. the same baselines may perform orders of magnitude better or worse across frameworks. Our results show the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results.This work was partly carried out during the tenure of an ERCIM “Alain Bensoussan” Fellowship Programme. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreements n◦246016 and n◦610594, and the Spanish Ministry of Science and Innovation (TIN2013-47090-C3-2

    Improving accountability in recommender systems research through reproducibility

    Full text link
    Reproducibility is a key requirement for scientific progress. It allows the reproduction of the works of others, and, as a consequence, to fully trust the reported claims and results. In this work, we argue that, by facilitating reproducibility of recommender systems experimentation, we indirectly address the issues of accountability and transparency in recommender systems research from the perspectives of practitioners, designers, and engineers aiming to assess the capabilities of published research works. These issues have become increasingly prevalent in recent literature. Reasons for this include societal movements around intelligent systems and artificial intelligence striving toward fair and objective use of human behavioral data (as in Machine Learning, Information Retrieval, or Human–Computer Interaction). Society has grown to expect explanations and transparency standards regarding the underlying algorithms making automated decisions for and around us. This work surveys existing definitions of these concepts and proposes a coherent terminology for recommender systems research, with the goal to connect reproducibility to accountability. We achieve this by introducing several guidelines and steps that lead to reproducible and, hence, accountable experimental workflows and research. We additionally analyze several instantiations of recommender system implementations available in the literature and discuss the extent to which they fit in the introduced framework. With this work, we aim to shed light on this important problem and facilitate progress in the field by increasing the accountability of researchThis work has been funded by the Ministerio de Ciencia, Innovación y Universidades (reference: PID2019-108965GB-I00

    Coherence and Inconsistencies in Rating Behavior - Estimating the Magic Barrier of Recommender Systems

    Full text link
    Recommender Systems have to deal with a wide variety of users and user types that express their preferences in di erent ways. This di erence in user behavior can have a profound impact on the performance of the recommender system. Users receive better (or worse) recommendations depending on the quantity and the quality of the information the system knows about them. Speci cally, the inconsistencies in users' preferences impose a lower bound on the error the system may achieve when predicting ratings for one particular user { this is referred to as the magic barrier. In this work, we present a mathematical characterization of the magic barrier based on the assumption that user ratings are a icted with inconsistencies { noise. Furthermore, we propose a measure of the consistency of user ratings (rating coherence) that predicts the performance of recommendation methods. More speci cally, we show that user coherence is correlated with the magic barrier; we exploit this correlation to discriminate between easy users (those with a lower magic barrier) and di cult ones (those with a higher magic barrier). We report experiments where the recommendation error for the more coherent users is lower than that of the less coherent ones. We further validate these results by using two public datasets, where the necessary data to identify the magic barrier is not available, in which we obtain similar performance improvementsThis research was in part supported by the Spanish Ministry of Economy, Industry and Competitiveness (TIN2016-80630-P

    Workshop on reproducibility and replication in recommender systems evaluation - RepSys

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in RecSys '13 Proceedings of the 7th ACM conference on Recommender systems, http://dx.doi.org/10.1145/2507157.2508006.Experiment replication and reproduction are key requirements for empirical research methodology, and an important open issue in the field of Recommender Systems. When an experiment is repeated by a different researcher and exactly the same result is obtained, we can say the experiment has been replicated. When the results are not exactly the same but the conclusions are compatible with the prior ones, we have a reproduction of the experiment. Reproducibility and replication involve recommendation algorithm implementations, experimental protocols, and evaluation metrics. While the problem of reproducibility and replication has been recognized in the Recommender Systems community, the need for a clear solution remains largely unmet, which motivates the present workshop.This workshop was carried out during the tenure of an ERCIM “Alain Bensoussan” Fellowship Programme, funded by European Comission FP7 grant agreement no.246016

    Gestión del conocimiento y competitividad educativa de la Escuela de Infantería del Ejército Peruano-Chorrillos; 2017

    Get PDF
    Esta pesquisa tuvo como propósito relacionar gestión del conocimiento y competitividad educativa desde la percepción del Personal de la Escuela de Infantería - Chorrillos, 2017. La investigación según su alcance de tipo descriptivo a partir de un diseño no experimental, transversal- correlacional; que tuvo como población muestral a 102 personales de la Institución mencionada y a quienes se les administro dos cuestionarios uno para la gestión de conocimiento y otro sobre competitividad educativa. Los cuestionarios fueron sometido a pruebas de validación, llegándose a comprobar que existe relación significativa entre la gestión del conocimiento y Competitividad educativa desde la percepción del personal de la Escuela de Infantería - Chorrillos, 2017; al obtener un nivel de significancia de 0,00 menor al estimado de 0,05Tesi

    Estudio comparativo del derecho a la educación en personas de nacionalidad siria bajo protección internacional en Argentina y Ecuador, 2017-2020

    Get PDF
    En el año 2011 inició en Medio Oriente el Conflicto armado sirio que ha provocado el desplazamiento de millones de personas tanto interna como externamente. Estos flujos migratorios se han movido hacia países como Turquía, Alemania, Francia y América Latina. Dentro de estos flujos se encuentran niños, niñas y adolescentes (NNA) que están en situación de vulnerabilidad, algunas de estas familias sirias llegaron a Ecuador y Argentina. Una de las principales problemáticas es que los Estados no siempre cuentan con los mecanismos adecuados para abordar las necesidades específicas de poblaciones extracontinentales. La finalidad del estudio es conocer el abordaje del derecho a la educación de los niños, niñas y adolescentes sirios con necesidad de protección internacional (NPIN) por parte de Ecuador y Argentina. Para lograr este objetivo se analizó la información obtenida de las entrevistas realizadas a familias sirias residentes en Ecuador y Argentina, contrastándolas con datos obtenidos en fuentes secundarias como normativa nacional e internacional, informes, investigaciones, documentales y relatos. Los participantes fueron contactados por medio de comunidades religiosas a las cuales asisten. Argentina emprendió un programa focalizado en población siria y estableció una ruta de trabajo programado y controlado, dando resultados positivos, demostrando que garantiza los estándares mínimos respecto el derecho a la educación, pero tiene una limitante que es el tiempo de ejecución. Ecuador no emprendió ningún programa focalizado para población siria, no obstante, no se imposibilita el acceso a le educación, pero se evidencian dificultades respecto aceptabilidad y adaptabilidad, lo que denota la importancia de actuar reconociendo la diversidad de la población en sus necesidades. El cumplimiento que se nota en el caso argentino responde a la educación no formal que se centra en sus destinatarios y sus necesidades. La educación requiere de una racionalidad, que se deba para y por los derechos de quienes migran forzadamente, rompiendo el esquema del enfoque utilitarista y del capital humano, centrándose así en sus necesidades bio-psico-sociales

    A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems

    Get PDF
    The evaluation of recommender systems is crucial for their development. In today's recommendation landscape there are many standardized recommendation algorithms and approaches, however, there exists no standardized method for experimental setup of evaluation -- not even for widely used measures such as precision and root-mean-squared error. This creates a setting where comparison of recommendation results using the same datasets becomes problematic. In this paper, we propose an evaluation protocol specifically developed with the recommendation use-case in mind, i.e. the recommendation of one or several items to an end user. The protocol attempts to closely mimic a scenario of a deployed (production) recommendation system, taking specific user aspects into consideration and allowing a comparison of small and large scale recommendat

    Information Retrieval and User-Centric Recommender System Evaluation

    Get PDF
    Traditional recommender system evaluation focuses on raising the accuracy, or lowering the rating prediction error of the recommendation algorithm. Recently, however, discrepancies between commonly used metrics (e.g. precision, recall, root-mean-square error) and the experienced quality from the users' have been brought to light. This project aims to address these discrepancies by attempting to develop novel means of recommender systems evaluation which encompasses qualities identified through traditional evaluation metrics and user-centric factors, e.g. diversity, serendipity, novelty, etc., as well as bringing further insights in the topic by analyzing and translating the problem of evaluation from an Information Retrieval perspective
    corecore